DOCUMENT RESUME 



ED 070 661 



24 



S£ 015 470 



AUTHOR 
TITLE 

INSTITUTION 

SPONS AGENCY 

REPORT NO 
BUREAU NO 
PUB DATE 
CONTRACT 
NOTE 

EORS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Voelker, Alan M. ; Harris, Margaret L. 

Measuring Science Concept Attainment of Elementary 

School Boys and Girls. 

Wisconsin Univ., Madison. Research and Development 

Center for Cognitive Learning. 

National Center for Educational Research and 

Development (DHEW/OE) , Washington, E.C. 

WRDCCL-TR-197 

BR- 5-02 16 

Nov 71 

OEC-5-1 0-154 
35p. 

MF-$0.65 HC-$3.29 

Classification; Conceptual Schemes; Educational 
Research; ^Elementary School Science; ♦Evaluation; 
Science Education; Scientific Concepts; *Test 
Construction; Tests 
Research Reports 



ABSTRACT 

Test items were developed for assessing the mastery 
of 30 selected science concepts on classification. These concepts 
vrere drawn from the areas of physical, biological, and earth, 
sciences. A schema of twelve test items was developed for each 
concept. Procedures used in the construction and revision of these 
test items are described. The tests were given to beginning sixth 
grade children and the publication includes most of the statistical 
data. Separate data analyses are presented for boys and girls. 
(PS) 
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STATEMENT OF FOCUS 



The Wisconsin Re*' 'arch and Development Center for Cognitive Learning 
focuses on contributing to a better understanding of cognitive learning by 
children and youth and to the improvement of related educational practices. 
The strategy for research and develot>ment is comprehensive. It includes 
basic research to generate new knowledge about the conditions and processes 
of learning and about the processes of instruction » and the subsequent devel- 
opment of research-based instructional materials, many of which are designed 
for use by teachers and others for use by students. These materials are 
tested and refined in school settings. Throughout these operations behavioral 
scientists, curriculum experts, academic scholars, and school people inter- 
act, insuring that the results of Center activities are based soundly on a 
knowledge of sijbject matter and cognitive learning and that they are applied 
to the improvement of educational practice. 

This Technical Report is from the. Project on the Structure of Concept 
Attainment Abilities in Program 1 and from the Quality Verification Program, 
The Quality Verification Program assisted in developing tests to measure 
concept achievement and identifying reference tests for cognitive abilities, 
while the Concept Attainment staff took primary initiative in identifying 
basic concepts in science at intermediate grade levels. The tests will be 
used to study the relationships among cognitive abilities and learned con- 
cepts in various subject matter areas . The outcome of the Project will be 
a formulation of a model of structure of abilities in concept attainment in 
a number of subjects, including mathematics, social studies, and language 
arts, as well as science. 
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Abstract 



The procedures employed In constructing and revising 
12-item tests of concept attainment for 30 selected science 
concepts are described. The total score and individual item 
statistics for data collected on a group of beginning sixth 
grade children are presented and discussed. Separate data 
analyses are presented for the boys and the girls • 



I 

Introdaction 



The project entitled "A Structure of Con- 
cept Attainment Abilities" (to be referred to as 
the CAA Project) has as Its primary objective 
the formulation of one or more models or struc- 
tures of concept attainment ability. The devel- 
opment of the model or models Is to be based 
on the collection of actual data. 

In the fields of both educational psychol- 
ogy and science education, much discussion 
In the literature has been devoted to classify- 
ing and/or defining types of concepts. Accord- 
ing to Klausmeler, Harris, Davis, Schwenn, 
and Frayer (1968), concepts may be defined In 
one or more of four ways: 

1. Structurally, In terms of perceptible 
or readily specifiable properties or 
attributes , 

2. Systematically, In terms of synonyms 
or antonyms, 

3. Operationally, In terms of the proce- 
dures used to distinguish the concept 
from other concepts, or 

4. Axlomatlcally, In terms of logical or 
numerical relationships. 

Bourne (1966, p. I) states that "a concept 
exists whenever two or more distinguishable 
objects or events have been grouped or classi- 
fied together and set apart from objects on the 
basis of some common feature or property of 
each." A comparison^ of this definition with 
the definitions statecTabove reveals a marked 
similarity between It and a concept defined 
structurally. 

The roots of both these definitions seem 
to be based In the scientific process of classi- 
fication. As such, concepts which are formu- 
lated as a result of this classification process 
might be Identified as members of a group of 



concepts known as class If Icatorv concepts. 
It Is concepts of this variety with which this 
project Is concerned. A classlflcatory con- 
cept Is defined as a concept which has three 
characteristics: 

1 . There is more than one example of 
the concept, 

2. The properties (attributes) of the con- 
cept can be described, and 

3. The concept can he labeled (named) 
by a word or compound word . 

This definition served as the basis for the 
selection and analysis of the subject-matter 
concepts . 

The major steps In pursuing the objective 
for formulating models for the attainment of 
classlflcatory concepts are as follows: 

1. To Identify basic concepts In language 
arts, mathematics, science, and so- 
cial studies appropriate at the fourth- 
grade level, 

2. To develop tests to measure achieve- 
ment of these concepts, 

3. To Identify reference tests for measur- 
ing cognitive abilities, and 

4. To study the relationship among learned 
concepts In these four subject matter 
fields and the Identified cognitive abil- 
ities. 

In the assessment of concept attainment, 
the learner may be asked to demonstrate his 
attainment of a concept by performing a wide 
. variety of tasks. The ability to perform one or 
more tasks Is usually taken as the crlterlal evl- 



dence that a student does or does not under- 
stand a concept. Because of the nature of this 
project, it was necessary to have agreement 
on the nature of the tasks that would be em- 
ployed for making assessments about the level 
of attainment of the selected concepts. In 
this way this variable would be controlled 
among the subject matter disciplines repre- 
sented in terms of the construction of instru- 
ments for measuring concept attainment. The 
schema selected to fulfill this requirement was 
developed by Frayer, Fredrick, and Klaus - 
meier (1969). The "Schema for Testing the 
Level of Concept Mastery" consists of 13 
types of questions, each of which purportedly 
gives an indication of some level of under- 
standing of a concept. Each tesk elicits a 
different type of performance from the examinee. 
There were 1 2 of these 13 kinds of questions 
selected for use in the s^udy. 

1 . Given the name of an attribute, select 
an example of the attribute. 

2. Given an example of an attribute, se- 
lect the name of the attribute. 

3. Given the name of a concept, select 
an example of the concept. 

4. Given the name of a concept, select 
a nonexample of the concept. 

5. Given an example of a concept, se- 
lect the name of the concept. 



6. Given the name of a concept, select 
. the relevant attribute, 

7. Given the name of a concept, select 
the irrelevant attribute. 

8. Given the definition of a concept, se- 
lect the name of the concept, 

9. Given the name of a concept, select 
the definition of the concept. ^ 

\" 

10. Given the name of a concept, select' 
the supraordinate concept. | 

1 1 . Given the name of a concept, select 
the subordinate concept. 

12. Given the names of two concepts, 
select the relationship between them. 

From the nature of the tasks identified above, 
it is obvious that for the purposes of this study 
a multiple-choice format which requires an 
examinee to select an answer was preferable 
to a testing format which requires an examinee 
to produce an answer. The schema, however, 
is sufficiently flexible to permit either of these 
kinds of data gathering procedures to be util- 
ized. 

It is thus the purpose of this technical 
report to provide a description of the test de- 
velopment effort for the science cohcepts to 
be included in the study. It is a report of 
one aspect of step two in the overall study. 



II 

Procedures 



Concept Selection 

As indicated earlier, one of the ba^ic pre- 
requisites to conducting this study was to have 
all concepts be representative of a general 
class of concepts. The class of concepts cho- .. 
sen was Identified as classificatory concepts, 
those which can be represented by a single 
or a compound word. The second prerequisite 
to identifying the concepts for inclusion in 
the study was that they should measure the 
level of mastery of concepts commonly taught 
in the elementary school at the intermediate 
grade levels. The third prerequisite to select- 
ing the concepts was that they should represent 
major topical areas pertinent to the subject- 
matter discipline. Three organizational pat- 
terns for selecting topical areas were consid- 
ered. The one that lent Itself most appropri- 
ately for use in this study was to group th^ 
concepts Into earth science, biological sci- 
ence, and physical science areas. A more 
detailed description of the rationale and pro- 
cedures for selecting the concepts can be 
found in Working Paper No. 57 (Voelker, Soren- 
son, & Frayer, 1971). 

The initial lists of concepts from each of 
the three topical areas are presented in Table 1. 
A total of 30 concepts was selected for inclu- 
sion in the study by randomly sampling 10 
concepts from each of the three topical areas. 
The 30 concepts for v/hich tests of concept 
attainment were developed are idenlified with 
an asterisk. 

For each of these 30 concepts, a compre- 
hensive concept analysis was completed. The 
framework for conducting the analysis can be 
found in Working Paper No. 16 (Frayer et aU, 
1069) . The information produced by a concept 
analysis includes: 

1 . Relevant and irrelevant attributes of- 
the concept/ 



2. Supraordinate, coordinate, and sub- 
ordinate concepts, 

3. Critsrlal attributes of the concept, 

4. A definition of the concept, 

5. Examples and nonexamples of the 
concept, and 

6. Relationships between and among 
the concept and other concepts. 

A detailed analysis of the 30 concepts Included 
in this study can be found In the appendix of 
Working Paper No. 57 (Voelker et al., 1971). 



Test Development 

The information provided by the concept 
analyse,? served as the basis for constructing 
the test items to be used In data collection. 
For each of the 30 concepts Included In the 
study, e 12-ltemtest was constructed, each 
of the l;i iU ''^\s per test corresponding to one 
of the t^sks ot the schema. This activity re- 
sulted in the production of 360 items for use 
in as&e.<sing science concept attainment. 

If one studies the tasks being used to 
measure understanding of a concept. It can 
readily be seen that more than one item can be 
generated for some of the tasks. For example, 
a Task 1 type item (given the name of an attri- 
bute, select an example of the ai^^lbute) could 
be constructed to measure understanding of 
each of the relevant attributes of a concept. 
It was decided for this project to construct just 
one multiple-choice item for each task for each 
concept. This made it necessary to have bases 
for making choices when such choices were nec- 
essary. These bases consisted of principles 
for selecting attributes , relationships, incor- 



Table 1 

Lists' of Science Concepts by Topical Area 



litOlOy ICul 


Lartn 


Physical 


k.>oience 


Science 


0(Ut:nr-- J 


Adaptation 


Air Pressure 


Burning 


Amphibian 


Atmoc^phere 


Condensation 


Animal 


*Cloud 


*Conductor 


*Bird 


♦Core 


Contraction 


Brain 


Crust 


Degree 


*Cell 


♦For.3il 


Dissolve 


Eardrum 


♦Glacier 


♦Evaporation 


Environment 


Igneous rock 


♦Expansion 


*FJ.sh 


Magma 


Force 


♦Heart 


Mantle 


♦Friction 


Hibernate 


Metamorphic rock 


Fuel 


♦Invertebrate 


♦Meteor 


Gas 


*Lens - eye 


Meteorite 


♦Liquid 


Ligament 


Mineral 


Magnet 


♦Lungs 


♦Moon 


Matter 


♦Mammal 


Orbit 


♦Melting 


♦Muscle 


♦Planet 


Molecular movement 


Nervous system 


Season 


*Molecule 


Optic nerve 


♦Sedimentary rock 


Non-conductor 


. Plant 


Solar system 


♦Solid 


♦Pore 


Star 


♦Sound 


Reptile 


Sun 


Temperature 


Retina 


♦Volcano 


♦Thermometer 


Sense 


Weather 


• Work 


Skeleton 


♦Wind 




Survival 






Vertebrate 






Water 







♦Indicates that a test was developed and adrninistexed for this concept. 



rect choices, etc. A discussion of such bases 
can be found in "A Structure of Concept Attain- 
ment Abilities: The Problem and Strategies 
for Attacking It" (Harris, Harris, Frayer, & 
Quilling, in press). 

The initial draft of each item was pre- 
parysd by a science education specialist and 
reviewed by the principal investigator for sci- 
ence. When agreement on the appropriateness 
of the items was reached between these two 
parties, the items were further reviewed by a 
group of graduate students in science educa-. 
tion, each a specialist in biological science; 
earth science, and/or physical science. Their 
suggestions were considered in making further 
revisions in the items. These revised items 
were then critiqued by a committee composed 
of item writers from each of the four subject * 



matter areas being studied (science, socIt^I 
studies, language arts , and mathematics) , an 
experienced elementary school teacher with 
a specialty in reading and a measurement spe- 
cialist. The final critique was conducted by 
the principal investigator for science and a 
measurement specialist. Major concerns in 
the item construction process were readability 
of the items, validity, and reliability. 

Readability 

Each item was specifically constructed 
to minimize the chance that a student would 
be unable to answer a question because of 
his inability to read the item. Care was taken 
to use the simplest possible language and 



still be scientifically accurate at the level of 
a child in the* intermediate grades. Some assis- 
tance in meeting this criterion of readability 
was obtained from the analysis of the concepts. 
All attributes, examples, nonexamples, and 
the concept definition were stated in terms 
that a fifth-grade child could be expected to 
read and understand. 

Pilot studies were conducted to determine 
whether children of the respective age level 
could recognize the concept labels on the 
basis of their ability to read the words aloud 
and explain whether they knew something 
about the concepts. The evidence obtained 
from this pilot study and the independent re- 
view of sample items by outside consultants 
indicated that there would be little if any read- 
ing problem with the items, and that the con- 
cern for administering them in any way other 
than the. standard one in which the students 
read the items themselves was not justified. 



Validity 

The content validity of each of the items 
was of immediate concern during item construc- 
tion; aspects of construct validity were to be 
probed later using duplicate test construction, 
simplex analyses, and factor analyses of the 
results obtained using the content-valid items 
constructed. 



Content Validity 

Each item was constructed to meet the 
content and task specifications set for it. The 
schema adopted for use in measuring concept 
attainment specified the nature of the task to 
be performed by the student for a respective 
item. The content for each item was specified 
as a result of the prior concept analysis. It 
is recognized that the content specifications 
were not as precise as the task specifications 
because of the necessity of choosing a single 
attribute to be tested and the necessity of 
selecting incorrect alternatives for use in the 
construction of multiple-choice questions , for 
example. However, systematic construction 
of alternate choices was used whenever pos- 
sible. 

To further insure the content validity of 
the items, two persons who were familiar with 
the schema for assessing concept attainment 
but who were not involved in the item develop- 
ment process classified five random sets of 
72 Items, the items for six concepts in each 
set, according to content and task. Concept* 



analyses were available to these oersons at 
that time. They were able to correctly classify 
all but a few of the items. Any questions raised 
by this process were mutually resolved withthe 
subject matter principal investigator, the mea- 
surement specialist, and the reviewer. 



Reliability 

Utilizing the schema as the basis for pre- 
paring the 30-concept test provided a control 
over the kinds of items constructed. This con- 
trol over the nature of the items resulted in a 
12 (tasks) by 30 (concepts) matrix consisting 
of a score for each of the 360 items . In addi- 
tion to a single test score based on the 360 
items, it is possible to obtain two other types 
of test scores from a completely crossed de- 
sign of this nature. One alternative is to 
have 30 concept test scores based on 12 item 
tests. The other alternative is to have 1 2 
task type scores based on items selected for 
each of the 30 concepts. The total score for 
all 360 items was rejected since it assumes 
that neither task nor concept variation is 
present. Rather than make a choice between 
the other two alternatives ,• both- were done. 
(The theoretical problem of how to item analyze 
a completely crossed design of this variety re- 
mains to be solved.) 

The major concern was that the reliability 
for the concept test scores and the task test 
scores be sufficiently high to warrant further 
study using these data. It was recognized 
that there may be some contradiction in what 
was attempted. Tho items were constructed 
to comply with the completely crossed design, 
30 concepts by 12 tasks. One major objective 
of the project is to determine the dimensionality 
of the selected science concepts. If either or 
both of these were not unidimensional, then an 
internal consistency reliability estimate based 
upon items measuring aspects from the multi- 
dimensions would reflect this; the more dimen- 
sions present and the more uncorrelated they 
are, the lower the internal consistency esti- 
mate. Recognizing this and not being able to 
study the dimensionality of the two modes (con- 
cepts and tasks) until after the items were de- 
veloped, pilot studies were conducted using 
the items for all the concepts for the 12 tasks. 
As will be pointed out later, evidence indicates 
that sufficiently reliable scores can be obtained 
for both task scores and concept scores. 

Item Revision 

• A pilot study was conducted to gather data 
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to be utilized in revision of the items and the 
tests prior to conducting the actual study. A 
total of five 72-item tests was administered to 
every subject. Each test consisted of the 12 
items for six of the 30 concepts. Items were 
rv.ndomly assigned to position on the tests to 
miniirlze a probable learning effect that might 
have taken place had the items been presented 
in order on individual concept tests. (If one 
looks at the ordering of the schema, it is ob- 
vious what is meant by the potential for the 
learning effect when the items for individual 
concepts are presented in sequence.) 

The pilot study activities were conducted 
during January, April, and May of 1970 inSussex 
and Brookfield, Wisconsin. One of the five test 
instruments was administered in Sussex and the 
remaining four test instruments were admin- 
istered to children in Brookfield. The Brook- 
field testing was divided into two periods dur- 
ing which two of the four instruments were ad- 
ministered. Approximately 100 students took 
each of the five concept tests. Data collected 
from this pilot study were then utilized to re- 
vise the test items . However, the students 
in the Brookfield groups were predominantly 
at the 90th percentile on the Iowa Tests of 
Basic Skills (General)., which resulted in skewed 
results on the tests. Therefore, compensations 
had to be made when utilizing the results of 
the tests in revising the items. In addition to 
examining the results on individual items, the 
reliabilities were computed for. concept tests 
and task tests to be used in revising items. 

The scores on the tests were subjected 
to analysis by the Generalized Item Analysis 
Program (GITAP) (Baker, 1969). The output 
from this program provides the Hoyt reliability 
and standard error of measurement for the total 
test scores, and in addition provides fourkinds 
of information about the individual test items: 

1. The proportion responding to each pos- 
sible item response. The proportion of 
students who respond correctly to an 
item is an index of the difficulty level 
of that item. Thegreatet the value of 
this. difficulty index the easier the item. 

2. The item-criterion biserial correlation. 
The biserial correlation coefficient is 
an index of the discriminating ability 
of the item choice. For these analyses 
the criterion ability used was total con- 
cept or total task score. 

3. X50. X50 is the point on the criterion 
scale given in standard deviation units 
corresponding to the median of theitem 



characteristic curve. It is the point 
at which subjects with that score have 
a 50-50 chance of choosing that re- 
sponse. 

4. Beta. Beta is the reciprocal of the 
standard deviation of the item charac- 
teristic curve at the X50 point. It is 
an index of the discrimination power 
of the item. 

As indicated earlier, there is no known 
way to item analyze completely crossed data 
such as are produced in the design of this 
study. Therefore, the items were analyzed 
as part of a concept test score and as part 
of a task test score. This raises questions 
as to the interpretation of such results. The 
main referents used for interpreting the results 
and as a basis for making item revisions were 
the results obtained from the analyses of the 
concept scores. The tasks were fixed, and 
thus any arbitrary decisions were made in 
regard to appropriate content for incorrect 
choices, etc. The usual standards for item 
indices were not strictly adhered to, as a 
unique design for item analysis was being 
dealt with and a major object of the project 
is to study the dimensionality of the concepts 
and of the tasks. If high discrimination in- 
dices were demanded, the dimensionality may 
have been affected by making the items more 
homogeneous. Also, no attempt was made to 
manipulate the difficulty level of the items , 
since another objective of the project is to 
determine if any differential levels oi diffi- 
culty or complexity exist in the concepts and 
in the tasks. Therefore, the item analysis 
results were used as a very general guide to 
help in determining whether there were "hidden" 
weaknesses, clues, and/or incongruities in 
the items and even in a more general sense 
to show that what we were attempting to do 
was possible— sufficiently reliable concept 
and task scores could be attained when using 
this completely crossed design. 

The general criteria (guidelines) used in 
item revision were as follows: 

I . Biserial r: A high positive biserial r 
is desired for a correct choice and a 
high negative biserial r is desired for 
the incorrect choices . The general 
level of biserial r for the task scores 
was greater than or equal to .3 for 
the correct choice and less than or 
equal to a -.3 for the incorrect choices. 
For the concept test scores the limit on 
the biserial r was increased. 
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Table 2 



Mean Scores and Standard Deviations on 
Lorge-Thomdike Intelligence Test and Iowa Tests of 
Basic Skills for Students in Population and Samples 



Test 



Population 



Boys 



Girls 



Lorge-Thorndike 
Intelligence 

Iowa Basic Skills 
Vocabulary 



Reading 
Comprehension 



Language 
Skills 



Work -Study 
Skills 



Arithmetic 
Skills 

Composite 



X 
s 
N 



s 
N 

X 
s 
N 

s 
N 

X 
s 
N 

Y 
s 
N 

s 
N 



106.6 

2605 

5.53 

25 20 
5.44 

2520 
5.24 

25 20 
5.46 

2520 
5.05 

25 20 
5.35 

25 20 



106.11 
14.82 
161 

5.54 
1.41 
181 

5.29 
1.51 
181 

5.04 
1.44 
181 

5.41 
1.30 
181 

5.08 
.96 

181 

5.27 
1.17 
181 



112. 

13, 
239 



23 
37 



5.88 
1.33 
246 

5.97 
1.35 
247 



5, 
1, 
248 

5, 
1. 
248 

5, 
1, 
247 

5, 
1, 
245 



.82 
.34 

.86 
.18 

.35 
.00 

.77 
.11 



2. Beta: A high positive beta is desired 
for the correct item choice and high 
negative betas are desired for the in- 
correct item choices. When beta is 
equal to zero 

a. ' no one took the item choice, 

b. the biserial r is greater than or 
equal to l .OOi and beta cannot 

be computed but can be interpreted 
as approaching infinity (this is a 
peculiarity of the GITAP program) , 
and 

c. beta is really zero and does not 
discriminate. 

The revised items can be found in Working 
Paper No. 58, "Items for Measuring the Level 
of Attainment of Selected Classificatory Sci- 



ence Concepts by Intermediate Grade Children" 
(Voelker & Sorenson, 1971). 



Subjects 

The revised items were administered to 
a group of beginning sixth-grade children in 
Madison, Wisconsin, during the fall of 1970. 
The population consisted of all beginning 
sixth graders attending Madison public schools. 
A sufficient number of students was randomly 
selected and invited to participate in the study 
to result in approximately 200 boys and 200 
girls taking the revised tests. The actual num- 
ber of students who completed the test and for 
which test results were usable was 259 girls 
and 186 boys. Each student who participated 
in the study was paid a fee. 

Data from the Lorge-Thorndike Intelligence 
Test and the Iowa Tests of Basic Skills for the 
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Table 3 

Distribution of Fathers* Occupations for Students in the Samples 







Girls 


Boys 


00. 


Accountant 


4 


7 


01 . 


Architect 


3 


I 


02. 


Dentist 


3 


1 


03. 


Engineer 


10 


7 


04. 


Lawyer, Judge 


6 . 


2 


05. 


Clergyman 


— 


3 


06. 


Doctor 


12 


3 


07. 


Nurse 


~ 


— 


08. 


Teacher, Professor 


20 


15 


09. 


Other Professional 


26 


15 


11. 


Farmer 


— 


— 


21 . 


Owner of Business 


4 


2 


22. 


Manager, Official 


28 


13 


31 . 


Bookkeeper 


— 


~ 


32. 


Receptionist 


1 




39. 


Other Clerical 


6 


4 


49. 


Salesman 


27 


24 


51 . 


Craftsman, Skilled Worker 


39 


22 


52. 


Foreman 




2 


53. 


Armed Services - Officer 


— 


1 


54. 


Armed Services - Enlisted 




1 


61. 


Truck Driver 


5 


4 


62. 


Operative in Factory 


16 


11 


69. 


Other Operative 


12, 


12 


71. 


Fireman 


2 


2 


72. 


Policeman 


2 


4 


73. 


Other Protective Service 


3 




74. 


Practical Nurse, Nurse's Aide 


1 


1 


75. 


Private Household Worker 






79. 


Other Service Workers 


14 


16 


81. 


Non-farm Laborer 


3 


2 


82. 


Farm Laborer 


1 




91. 


Not presently in labor force 


6 


6 


99. 


Not ascertained 


12 


10 



it was decided to test the selected students 
from these schools in their own buildings after 
school hours. The remaining students who 
did not attend one of these three schools were 
tested in three consecutive Saturday morning 
sessions at centrally-located Madison schools. 
The test instruments were administered in 
2 1/2-hour testing sessions. During a given 
test session each student received a science 
test consisting of 72 items which took approx- 
imately 1 hour to complete and a similar test 
from another subject matter area. There was 
approximately a 1/2-hour break between taking 
the science test and the second test. 



boys and girls who participated in the study as 
well as for the population from which they were 
drawn are presented in Table 2. In addition to 
data on the individual subjects themselves, 
data regarding the nature of their fathers* occu- 
pations are presented in Table 3. 

Test Administration 

The testing program was carried on|in two . 
situations. Since a large percentage of the 
sixth grade students in the Madison public 
schools attend one of three middle schools. 
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Dota Analysis and the 1 2 task tests a reliability estimate was 

computed, and an item analysis was conducted 

The data obtained from the administration for each of the 360 items . Each item was ana- 

of the tests according to the previously de- lyzed as part of two test scores, the concept 

scribed procedures produced two types of infer- test score and the task test scorG. Total test 

ination for both the boys and the girls— separate score information and the individual item data 

analyses for boys and girls were conducted were examined in detail according to the pro- 

throughout. For each of the 30 concept tests cedures described on page 6-7. 
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• Results and Discussion 



The test data firom the administration of 
the revised items to a group of beginning sixth- 
grade students as previously described were 
subjected to analysis with the GITAP program. 
In keeping with the design of the study sep- 
arate analyses were performed for the boys 
and the girls. Tables 4 and 5 present the 
means, standard deviations, and Hoyt reliabil- 
ities for the concept test scores and the task 
test scores. All concept test scores are based 
on 12 items, while the task test scores are 
based on the analysis of 30 items. 



Moans 

Tests of Concept Attainment 

The mean scores attained by the boys on 
the 30 tests of concept attainment ranged from 
6.62 to 9.61 out of a possible attainable score 
of 1 2. 00 rrable 4) . For the tests in both the 
biological and the physical science areas, the 
difference between lowest and highest mean 
scores was approximately (wo points, whereas 
in the earth science area, there was less than 
a 1 -point difference between the lowest and 
the highest mean score. Overall, the highest 
mean scores were obtained in the biological 
science area and the lowest mean scores in 
the physical science area. 

The range of mean scores for the girls on 
th^30 concept attainment tests was in excess 
of 4 points, from 6.04 to 10.44. The pattern 
of mean scores for the concept tests within 
the specific science areas was similar to that 
observed for the boys. The greatest differences 
between the lowest and hinhest mean scores 
occurred in the biological and physical science 
areas. Also, the highest means were obtained 
for concept tests in the biological science 
area and the lowest means were obtained for*-, 
the concept tests in the physical science area. 



The girls attained a higher mean score than 
the boys on 25 of the 3'J concept tests. This 
overall pattern of mean scores for girls being 
greater than that for boys was also noted in 
each of the three specific science areas; girls 
had higher mean scores on eight or more of the 
concept tests in each of the areas. (Mote at 
this point that no consideration has been given 
to differences in test reliabilities for the boys 
and the girls. Note also that the tests have 
only 1 2 items.) 

An examination of Table 4 reveals the fol- 
lowing: 

1 . For both the boys and the girls the con- 
cepts for which the highest mean scores 
were attained were Mammal and Fish, 
both from the biological science area. 
The four concept tests on which the 
subjects earned the lowest scores were 
also identical for both the girls and 

the boys: Invertebrate, Cell, Molecule, 
and Conductor. 

2. The ten concept tests with highest ranked 
mean scores included nine of the same 
concepts for the girls and the boys. 
Five of these concepts were hom the 
biological science area and three from 
the earth science area. Eight of the 

ten concept tests with the lowest ranked 
mean scores were the same for the girls 
and the boys. Four were from the physi- 
cal science area and three from the bio- 
logical science area. 

There was a difference of four to eight posi- 
tions in the rank order of the mean scores on 
seven of the 30 concept attainment tests. The 
mean test scores for boys were at a higher rank 
order for five of these seven. There was one 
biological science concept in this group of 
ranked differences, the mean score for girls 
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Table 4 

Means, Standard Deviations, and Hoyt Reliabilities for 
Tests of Concept Attainment— Science 



Standard 





Concept 


Mean 


Deviation 


Hoyt Reliability 




Boys 


Girls 


Bovs 


Girls 






1 


12l«>rl 

oira 


8.88 


9.43 


2.20 


1.88 


.62 


.55 




oeii 


7.33 


7.29 


2.50 


2.28 


.61 


.54 


O • 


riSn 


9.42 


10.08 


2.30 


1.86 


.71 


.65 


A 


TT ^ lilt / T T . . ^ ^ V 

Heart (Human; 


3.79 


9.36 


2.63 


2.39 


.74 


.72 


c 

3 • 


inveneDrate 


■ v ii n 
/ .^U 


7.42 


2.79 


2. 63 


.73 


.69 


6, 


Lens (Eye) 


7.87 


8.08 


•>.47 


7 1 Q 


. Do 


. DU 




Lungs 


8.95 


9.45 


2.82 


2. 61 


7Q 


7ft 


8. 


Mammal 


9.61 


10.44 


2.48 


2.11 


76 


7fi 


9. 


Muscle 


8.07 


7.99 


2.52 


2.69 


67 


70 


10. 


Pore (Skin) 


8.2C 


8.85 


2.68 


2 73 


7 9 


•7*7 


11 » 


OlOUQ 


8.22 


8.67 


2.58 


2.03 


.72 


• .58 


1 ^* 


ijore v^rtn; 


8.68 


8.99 


2.66 


2.22 


.75 


.66 


lo « 


rOSSli 


8.81 


9.36 


2.46 


2.08 - 


.70 


.62 


1 A. 
1 't • 


uiiocier 


8.32 


8.69 


2.48 


2.41 


.66 


.67 


1!> • 


Meteor 


7.65 


i:7.67 


2.76 


2.43 


.72 


.64 


16. 


Moon 


8.57 


8.34 


2.76 


2.85 


.76 


.78 


17. 


Planet 


8.32 


8.67 


2.43 


2.36 


.68 


.68 


18. 


Sedimentary Rock 


7.91 


8.75 


2.68 


2.50 


.71 


.72 


19. 


Volcano 


8.78 


9.27 


2.33 


2.08 


.65 


.60 


20. 


Wind 


8.76 


9.56 


2.56 


2.19 


.71 


.67 


21. 


Conductor 


6.62 


6.04 


2.73 


2.60 


.68 


.66 


22. 


Evaporation 


7.99 


8.29 


2.71 


2.53 


.71 


.67 


23. 


Expansion 


7.51 


7.80 


2.74 


2.77 


.71 


.73 


24. 


Friction 


7.69 


7.49 


2.35 


2.16 


.62 


.52 


25. 


Liquid 


8.89 


9.22 


2.35 


2.29 


.67 


.68 


26. 


Melting 


7.75 


8.37 


2.40 


2.16 


.65 


.62 


27. 


Molecule 


6.62 


6.99 


2.48 


2.33 


.60 


.56 


28. 


Solid 


8.58 


9.56 


2.76 


2.21 


.77 


.69 


29. 


Sound 


8.16 


8.54 


2.49 


2.27 


.69 


.66 


30. 


Thermometer 


8.34 


8.68 


2.54. 


2.07 


.71 


.57 



receiving the higher rank. In both the physical 
science and the earth science areas » there were 
three concepts with wide differences in the rank 
orders of the mean scores. Mean test scores 
for the boys received the higher rank order for 
all three physical science concepts and two of 
the three earth science concepts. 



Tests of Task Attainment 

Data were also analyzed in terms of the 
meah scores earned for each of the 1 2 tasks 
across the 30 concepts (Table 5). Each of the 



30 concept tests consisted of 12 items. The 
girls attained higher mean scores than the boys 
•>on each of the 1 2 task attainment tests . For 
both the boys and the girls the first five tasks 
from the schema referred to previously (page 2) 
received the highest five ranks. They did 
not progress from one. to five but the mean 
scores varied only slightly. For the boys 
it is of note that the rank orders of the 
mean scores for Tasks 6 and 7, selection of 
relevant and irrelevant attributes, were 10 and 
12, respectively. Ranks of the other means 
appear to follow the general progression of 
the schema. For the girls it is of note that 



Table 5 

Means I Standard Deviations, and Hoyt Reliabilities for 
Tests of Task Attainment— Science 



Standard 

Task Mean Deviation Hovt Reliability 



Number^ 


Bovs 


Girls 


Boys 


Girls 


Boys 


Girls 


1 


23. 


17 


24 


.54 


5.14 


4. 


51 


.84 


.83 


2 


22. 


22 


23. 


44 


5.74 


4. 


80 


.87 - 


.84 


3 


23. 


50 


24 


.11 


4.46 


3. 


60 


.80 


.72 


4 


23. 


34 


23 


.65 


4.20 


3. 


38 


.76 


.66 


5 


22. 


95 


23 


.57 


5.36 


4. 


30 


.85 


.78 


6 


18. 


76 


20 


.18 


6.10 


5. 


61 


.85 


.83 


7 


16. 


76 


18 


,05 


6.30 


5. 


74 


.85 


.83 


8 


20. 


17 


21 


.37 


6.81 


5. 


76 


.89 


.85 




19. 


06 


20 


.26 


6.48 


5. 


99 


.87 


.86 


10 


20. 


67 


21 


.04 


6.50 


5. 


94 


.88 


.87 


11 


18. 


82 


19 


.49 


5.66 


4. 


81 


.33 


.77 


12 


17. 


32 


17, 


,63 


5.90 


5. 


52 


.83 


.81 



^1, Given name of attribute, select example of attribute. 

2. Given example of attribute, select name of attribute. 

3.. Given name cf concept, select example of concept. 

4. Given name of concept, select nonexample of concept. 

5. Given example of concept, select name of concept. 

6. Given name of concept, select relevant attribute. 

7. Given name of concept, select irrelevant attribute. 

8. Given definition of concept, select name of concept. 

9. Given name of concept, select definition of concept. 

10. Given name of concept, select supraordinate concept. 

11. Given name of concept, select subordinate concept. 

12. Given names of two concepts, select principle relating them. 

\ 



the rank order oi the mean score on Task 7 was 
11. All else seems to follow the general order 
of the schema. 

Standard Deviations 

The standard deviations for the tests of 
concept attainment were greater for the boys 
than for the girls on 26 of the 30 concept tests. 
In the biological science area, the standard 
deviations for boys were greater than those 
for Q'rls on eight of the ten tests, and in both 
the earth science and the physical science 
areas, the standard deviations for boys were 
greater tlian those for girls on nine of the ten 
tests . The standard deviations on the task 
test scores were greater for the boys in all 12 
instances. 



Reliabilities 

Hoyt reliabilities on the tests of concept 
attainment for the boys were greater than or 
equal to those for the girls on 23 otthe 30 
tests . In the earth science area, the test 
reliabilities for boys were greater than or 
equal to those for the girls on seven .'^f the 
ten tests. In the biological and physical sci- 
ence areas, the test reliabilities for boys were 
greater than or equal to those for the girls on 
eight of the ten tests. On the task tests the 
reliabilities for the boys were greater than 
those for the girls in every instance, the 
largest difference in reliabilities being 0.10. 

itam indices 

l^he item indices for the 360 items pre- 



13 

19 



pared for use In this study (1 2 tasks, 30 con- 
cepts) are presented In Table 6. Again, the 
analyses for the boys and the girls are pre- 
sented separately, and tl^e information pro- 
vided for each item includes the proportion 
correct, the biserial correlation, theXso, 
and beta. Except for the proportion correct 
which is the same whether the item was part 
of the concept test score or the task test score, 
the item indices information is presented for 
the way in which the item performed as part 
of the concept test score or as part of the task 
test score. (The actual performances required 
by the respective tasks are listed at the bottom- 
of Table 5 .) Note that the item indices for the 
respective items are presented for the conrect 
choice only. If one wishes to identify the 
specific concept and/or task for which data 
are presented, it is necessary to refer back 
to Tables 4 and 5. For example. Concept No. 
2 as indicated in Table 6 would be Cell and the 
1-12 tasks as indicated in Column 2 of Table 
6 are for that respective concept. The third 
column in Table 6 in which the items are num- 
bered consecutively firom 1 to 360 is simply 
an ordering devicjB and does not present any 
additional information regarding item indices 
on concept test scores or task test scores . 



Item Difficulty 

As indicated earlier in this paper, the 
mean scores on the tests indicated that the 
tests were more difficult for the boys than 
they were for the girls . This was evidenced 
by the fact that the girls attained higher mean 
scores on 26 of the 30 concept tests. Usually, 
however, it is not good practice to judge the 
difficulty of a test and its respective items on 
the basis of mean scores. More precise mea- 
sures of item difficulty can be obtained ^i6rom 
data regarding the proportion of items correct 
andtheXsQ. 

As can be seen firoro Table 6, the girls 
attained a higher proportion of correct responses 
on a majority of the test items. On 22 of the 
30 concept attainment tests, the girls had the 
highest proportion correct on more than 50% of 
the items, and on 16 of the 30 tests the pro- 
portion correct for the girls was higher than 
for boys on over two-thirds of the items . 

The proportion correct as a means of judg- 
ing item difficulty gives a more precise Indi- 
cation of item difficulty than a mean test score 
but is not as precise as the Xsq- Fiurther, the 
proportion correct does not give a good mea- 
sure of the way in which an item is performing 
in reference to a criterion score. For instance, 

14 



in this study, the proportion correct remains 
the same for both the criterion scores used, 
namely the concept attainment score and the 
task attainment score. The index tells how 
many students responded to the correct answer 
for an item but it says nothing about their abil- 
ity level as measured by the specific criterion 
score. 

A more precise measure of item difficulty 
than either the mean scores or proportion cor- 
rect is the item difficulty index X50. This 
difficulty index gives in standard deviation 
units the criterion score at which a subject 
would have a 50-50 chance of getting the 
item correct. For example, an XgQ value of 
1.50 for an item indicates that subjects with 
a criterion score 1.50 standard deviation units 
above the mean have a 50% chance of answer- 
ing that item correctly. Subjects with a cri- 
terion score higher than this would have a 
greater chance of answering that item correctly 
and subjects with a criterion score lower than 
this would have a lesser chance. Likewise, 
an X50 value of -1 .50 indicates that subjects 
with a criterion score 1.50 standard deviation 
units below the mean would have a 50% chance 
of getting that item correct. For a higher score 
the chance v/ould be greater and for a lower 
score the chance would be less. Knowledge 
of both the X50 and the beta for a specific 
item permits one to easily determine the 
probability of answering an item correctly 
for any point on a criterion scale (Baker, 1964). 
It may be pointed out that when p = .50, X50 = 
.00; when p is greater than .50 then X50 will 
be negative; and for a certain p the higher the 
beta value the closer to zero will be the X50 
value. 

On the basis of the X50 scores for the 
items on the tests of concept attainment and 
task attainment, there is further support for 
the fact that the test items were more difficult 
for the boys than they were for the girls . The 
X50 values favored the girls on 248 of the 360 
items when the concept score was the criterion 
score and on 239 of the items when the task 
score was the criterion score. Of note is that 
for 215 of these items theXso item indices 
favor the girls on both the concept test score 
and the task test score. 

Further information about the appropriate- 
ness of an item and its respective distractors 
can be obtained firom an examination of the 
biserial correlation and beta. These indices 
are quite closely related since beta is com- 
putjBd.as a function of the biserial correlation 
(Baker, 1969). Ifowever, the relationship is 
not linear. From .00 to about .30 (absolute) 
they are very nearly the same. Beyond this 
(Continued on page 26) 
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point cbeta begins to increase quite rapidly in 
magnitude in comparison to the biserial cor- 
relation. It may be pointed out that beta is 
always equai to or greater (absolute) than the 
biserial correlation. As a general rule, .30 
is often used as a lower cutting point for a 
desirable biserial correlation or a beta. . For 
a total score composed of relatively few items, 
as is the concept score* a much higher mini* 
mum would be desirable. 

It can be noted from Table 6 that only 36 
of the 360 items (10%) had betas less than 
0.30 when functioning as either part of a con^ 
cept criterion score or a task criterion score. 



It is of further note that when raising the beta 
level to 0.40 only one-sixth of the test items 
have beta scores less than this value when 
functioning as either part of a concept criterion 
score or a task criterion score. For those 
items which had betas less than 0.40, 52 
were weaker items when they functioned as 
part of a task criterion score. A comparison 
of items for the girls and the boys indicates 
no preference for one group over the other in 
terms of the magnitude of the betas. 

In general, only 10 of the 360 items func- 
tioned poorly as part of both the concept cri- 
terion score and the task criterion score. 
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Conclusions 



The discussion of the means , standard 
deviations, and Hoyt reliability estimates pre- 
sented in this paper for the concept attain- 
ment tests and the task attainment tests indi- 
cates that the tests as constructed are appro- 
priate for facilitating the overall objectives 
of the project entitled "A Structure of Concept 
Attainment Abilities Further support for the 
appropriateness of these test instruments is 
obtained from the examination of four different 
item indices: proportion correct, item criterion 
biserial correlation, X5O1 and beta. The analy- 
ses of these four indices based on each of 



two criterion scores indicate that it is appro- 
priate to use these tests to continue pursuance 
of the study of the relationship between con- 
cept attainment, tci.sk performance, and se- 
lected measures oi cognitive ability. The 
item indices show that these items are of 
appro iM*iate difficulty level for use with sub- • 
jects at this age level. It is further indicated 
by analyses of these data that the majority 
of the items have desirable levels of discrim- 
ination indices when the item is both a part 
of a concept criterion score and a task cri- 
terion score. 
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